llm: LoRa support #4

kyriediculous · 2024-08-25T19:37:37Z

This PR introduces support for Low-Rank Adaptation (LoRA) in our LLM inference pipeline, allowing for dynamic model adaptation at inference time. Key changes include:

Enhanced LLMGeneratePipeline:
- Added LoRA weight application functionality
- Implemented a queue system for handling concurrent requests with different LoRA weights
- Retained all existing model loading and optimization strategies (8-bit quantization, fp16/bf16 loading, distributed loading)
Updated API route:
- Added support for LoRA weights as an optional parameter
- Implemented validation for LoRA weight input (base64 encoded string)
Maintained compatibility:
- Preserved all existing pipeline functionality (memory management, device mapping, stable-fast optimization)
- Ensured backward compatibility for requests without LoRA weights
Updated dependencies:
- Verified and updated requirements.txt to support new functionality

These changes enable users to apply custom LoRA weights to the base model at inference time, allowing for tailored model behavior without changing the underlying model. This feature enhances the flexibility of our inference API while maintaining its performance and efficiency.

Testing:

Verified functionality with and without LoRA weights
Tested concurrent requests with different LoRA weights
Ensured compatibility with existing pipelines and configurations

Open todo's:

Comprehensive testing with various model sizes and LoRA configurations
Performance benchmarking to assess impact on inference speed
Documentation update to explain LoRA usage in API calls

kyriediculous added 10 commits July 31, 2024 01:48

runner: add llm-generate route and pipeline

9eb654d

add llama3.1 8B to downloads

9ecabab

worker: add llm-generate container management

a8362a7

update transformers

0619926

llm: support streamed responses

922f9d2

Load LLM model distributed over multiple GPUs

a11391f

feat: support 8bit and fp16 for llm pipeline

87bfe3f

fix streaming and full multipart body for llm

468d65d

fix history parsing

401dbbe

llm: LoRa support

35fd2b8

kyriediculous force-pushed the llm branch from 401dbbe to d577b74 Compare September 27, 2024 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm: LoRa support #4

llm: LoRa support #4

kyriediculous commented Aug 25, 2024

llm: LoRa support #4

Are you sure you want to change the base?

llm: LoRa support #4

Conversation

kyriediculous commented Aug 25, 2024